Simplifying Text Processing with Grammatically Aware Regular Expressions
نویسندگان
چکیده
In our paper we introduce Grammatically Aware Regular expression (GARE) and describe its usage using examples from moral consequences retrieval task. GARE is an extension to the regular expression concept that overcomes many of the difficulties with traditional regexp by adding Normalization (e.g., searching all grammatical forms with basic form of a verb or adjective is possible) or POS awareness (e.g. searching only for adjectives after “wa” particle is possible). We explain how it works, what makes it more expressive for natural language, and how it solves a number of matching cases that traditional regular expressions cannot solve on their own.
منابع مشابه
Sentiment Mining and Indexing in Opinmind
This paper presents a production system that efficiently mines social networking sites for sentiments and indexes the expressions for fast retrieval via a web search interface. Sentiment mining is a computational approach used to identify expressions made about topics within a span of text. Social networks represent a particularly rich corpus for mining sentiments because writers express sentim...
متن کاملOptimally Streaming Greedy Regular Expression Parsing
We study the problem of streaming regular expression parsing: Given a regular expression and an input stream of symbols, how to output a serialized syntax tree representation as an output stream during input stream processing. We show that optimally streaming regular expression parsing, outputting bits of the output as early as is semantically possible for any regular expression of size m and a...
متن کاملSimplifying Regular Expressions: A Quantitative Perspective
In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an exact bound for the relation between the two most c...
متن کاملDomain Specific Text Processing for Speech Synthesis
In Text-to-Speech (TTS) synthesis there are words and expressions that pose problems because some semantic knowledge is required to determine how they should be read out. This work implements a domain filter, a pre-processing module that supports the TTS system by analysing text belonging to a certain semantic domain and rewriting problematic expressions so that they are read out better. The fi...
متن کاملSimplifying Regular Expressions
We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures...
متن کامل